Survival analysis is often used in cancer studies. It has been shown that combination of clinical data with genomics increases predictive performance of survival analysis methods. This tool provides a wide range of survival analysis methods for genomics research, especially in cancer studies. The tool includes analysis methods including Kaplan-Meier, Cox regression, Penalized Cox regression and Random Survival Forests. It also offers methods for optimal cutoff point determination for continuous markers.
Each procedure includes following features:
Kaplan-Meier: descriptive statistics, survival table, mean and median life time, hazard ratios, comparison tests including Log-rank, Gehan-Breslow, Tarone-Ware, Peto-Peto, Modified Peto-Peto, Flemington-Harrington, and interactive plots such as Kaplan-Meier curves and hazard plots.
Cox regression: coefficient estimates, hazard ratios, goodness of fit test, analysis of deviance, save predictions, save residuals, save Martingale residuals, save Schoenfeld residuals, save dfBetas, proportional hazard assumption test, and interactive plots including Schoenfeld residual plot and Log-Minus-Log plot.
Penalized Cox regression: feature selection using ridge, elastic net and lasso penalization. A cross-validation to investigate the relationship between partial likelihood devaince and lambda values.
Random survival forests: overall survival predictions (Nelson-Aalen estimator, overall ensemble), individual survival predictions (with OOB), individual cumulative hazard predictions (with OOB), error rate, variable importance, and interactive plots including random survival plot, cumulative hazard plot, error rate plot, Cox vs RSF plot
Optimal cutoff: determination of optimal cutoff value by maxmizing test statistics, including log-rank, Gehan-Breslow, Tarone-Ware, Peto-Peto, modified Peto-Peto, Flemington-Harrington.
This tool requires a dataset in *.txt format, which is seperated by comma, semicolon, space or tab delimiter. First row of dataset must include header. When the appropriate file is uploaded, the dataset will be appear immediately on the main page of the tool. Alternatively users can upload one of the example datasets provided within the tool for testing and understanding the operating logic of the tool.
Kaplan-Meier is a non-paranetric statistical method that is used to estimate survival probabilities and hazard ratios for a cohort study group. In clinical trials, it is often used to measure the part of patients living for a certain period of time after a treatment.
Survival time: Time until an event occurs (i.e. days, weeks, months, years)Status variable: The event (i.e. death, disease, remission, recovery)Category value for status variable: Category value of the event of interest (i.e. 1, yes)Factor variable: A categorical variable which indicates different study groups (i.e. treatment, gender)A Kaplan-Meier analysis can be conducted by applying the following steps:
Kaplan Meier from Analysis tab.survival time, status variable, category value for status variable and factor variable, if exists.Run button to run the analysis.Desired outputs can be selected by clicking Outputs checkbox. Available outputs are;
Summary statistics, such as number and percent of observations, events and censored cases can be obtained.
A survival table can be created. First column in the table represents factor group and number of time points (i.e. 1.2 means second time point in the first factor group, likewise 2.1 means first time point in the second group). Second column is survival time, third column gives number of subjects at risk, fourth column is the number of events, fifth column represents the cumulative probability of surviving, sixth, seventh and eight columns are associated standard error, lower and upper limits, respectively.
A forest plot can be created for each level of factor group using survival probabilites at each end point.
Mean and median life time and their associated confidence levels can be calculated for each level of factor group.
Hazard ratios and their respective lower and upper limits can be calculated for each factor group at each end point.
A forest plot can be created for each level of factor group using hazard ratios at each end point.
Six different comparison tests can be calculated for testing the differences in survival probability estimations between factor groups.
Kaplan-Meier curves can be created. A number of edit options is also available for plots.
Hazard plot can be created. A number of edit options is also available for plots.
Log-Minus-Log plot can be created. A number of edit options is also available for plots.
Cox regression, also known as proportional hazard regression, is a method to investigate the effect of one or multiple factors (i.e. gene expressions) upon the time an event of interest occurs. In this model, the effect of a unit increase in a factor is multiplicative with respect to the hazard rate.
A Cox regression analysis can be conducted by applying the following steps:
Cox Regression from Analysis tab.survival time, status variable, category value for status variable, and categorical and continuous predictors for the model.interaction terms, strata terms and time dependent covariates can be added to the model. Moreover, if there are multiple records for observations, users can specify it by clicking Multiple ID checkbox. Furthermore, once can choose model selection criteria, as AIC or p-value, model selection method, as backward, forward or stepwise, reference category, as first or last, and ties method, as Efron, Breslow or exact and change the confidence level.Run button to run the analysis.Desired outputs can be selected by clicking Outputs checkbox. Available outputs are coefficient estimates, hazard ratio, goodness of fit tests, analysis of deviance, predictions, residuals, Martingale residuals, Schoenfeld residuals and DfBetas.
A coefficient estimation table, which includes variable names, coefficient estimates and their associated standard errors, z statistics and p values, can be created.
A hazard ratio table, which includes variable names, hazard ratios and their associated lower and upper limits, can be created.
A forest plot can be created for hazard ratios to give them a visual inpection.
Fitted Cox regression model can be tested with three tests: Likelihood ratio, Wald, Score.
A deviance analysis can be conducted for each variable in the fitted model.
Predictions from the fitted model can be obtained.
Residuals from the fitted model can be obtained.
Martingale residuals from the fitted model can be obtained.
Schoenfeld residuals from the fitted model can be obtained.
DfBetas residuals from the fitted model can be obtained.
To check the proportionality assumption of Cox regression model, a proportional hazard test can be conducted both globally and for each variable in the fitted model.
Beside a formal test for proportionality assumption, a Schoenfeld plot can be created to check the assumption visually.
Another useful plot for checking proportionality assumption is log-minus-log plot. Lines should be parallel to each other to satisfy proportionality.
Feature selection is an useful strategy to avoid over-fitting, to obtain more reliable predictive results, and to provide more insights into the underlying casual relationships (Ma and Huang, 2008). In this section, a feature selection can be performed using ridge, elastic net or lasso penalty, especially when there are too many predictors (e.g. n<<p). More information can be found in Zou and Hastie, 2005, Freidman et al, 2008 and Simon et al, 2011.
A Penalized Cox regression analysis can be conducted by applying the following steps:
Penalized Cox Regression from Analysis tab.survival time, status variableSelect All Variables option to include all variables in dataset to the feature selection process. If some predictors categorical and others are continious, then uncheck the Select All Variables option and select categorical and continuous variables seperately.Penalty term slider as follow:Penalty term = 0: ridge penalty 0 < Penalty term < 1: elastic net penalty Penalty term = 1: lasso penalty
Run button to run the analysis.Variable selection is conducted with the selected penalized method (i.e. ridge, elasticnet, lasso) and results will be displayed as a table, which includes selected variables and their associated coefficient estimates.
A cross-validation curve can be created to investigate the relationship between partial likelihood devaince and lambda values.
Random survival forests, an ensemble method for analysing right censored data, first introduced by Ishwaran et al, 2008. RSF has several advantages over Cox regression: (i) Unlike Cox regression, RSF does not rely on proportional hazard assumption. (ii) RSF accounts for nonlinear effects and interactions for factor variables.
A random survival forests analysis can be conducted by applying the following steps:
Random Survival Forests from Analysis tab.survival time, status variable, category value for status variable, and categorical and continuous predictors for the model.interaction terms, strata terms and time dependent covariates can be added to the model. Moreover, if there are multiple records for observations, users can specify it by clicking Multiple ID checkbox. From RSF options, number of tree, bootstrap method, randomly selected number of variable, minimum number of cases in terminal node, maximum depth for a tree, splitting rule, number of split, missing values, number of iterations of the missing data algorithm, proximity of cases, size of bootstrap and type of bootstrap can be adjusted.Run button to run the analysis.##
## Trees Grown: 389, Time Remaining (sec): 4
Survival predictions for each observation can be obtained. In this table, rows represent observations whereas columns represent time endpoints.
Out of bag (OOB) survival predictions for each observation can be obtained. In this table, rows represent observations whereas columns represent time endpoints.
Cumulative hazard predictions for each observation can be obtained. In this table, rows represent observations whereas columns represent time endpoints.
Out of bag (OOB) cumulative hazard predictions for each observation can be obtained. In this table, rows represent observations whereas columns represent time endpoints.
An error rate table, which shows error rate estimations for each tree, can be obtained.
A variable importance table as well as an interactive plot, which shows relative importance of variables in fitted model, can be obtained.
A survival plot can be created based on Nelson-Aalen estimator and overall ensemble predictions.
A survival plot can be drawn for survival predictions from random survival forests model. Each line represents a survival curve for each observation.
A survival plot can be drawn for OOB survival predictions from random survival forests model. Each line represents a survival curve for each observation.
A cumulative hazard plot can be drawn for hazard predictions from random survival forests model. Each line represents a survival curve for each observation.
A cumulative hazard plot can be drawn for OOB cumulative hazard predictions from random survival forests model. Each line represents a survival curve for each observation.
An interactive error rate plot, which shows error rate alterations when number of trees increased, can be drawn.
A Cox model can be compared to random survival forests model through an interactive plot for visual inspection of both models.
##
## Trees Grown: 702, Time Remaining (sec): 1
To investigate whether the higher or lower expressions of differentially expressed genes lead to more survival risks for patients, expression levels of genes can be dichotomized based on certain cutoff values by maximizing certain test statistics.
An optimal cutoff value can be determined by applying the following steps:
Optimal Cutoff from Analysis tab.Select marker(s) boxSurvival time and Status variableRun button to run the analysis.An optimal cutoff value can be obtained as well as hazard ratio (HR) with confidence interval, mean survival time for low and high gene expression levels, and p value for selected significance test.
#####b) Optimal cutoff value(s) A Kaplan-Meier plot can be created after dichotomize the gene expression level as high and low.